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BY  A.  Antoniadis*,  G.  Gregoire*  and  I.  W.  McKeague  ** 

Universiie  Joseph  Fourier  (Grenoble)*  and  Florida  State  University** 

Abstract.  The  theory  of  wavelets  is  a  developing  branch  of  mathematics  with  a  wide  range  of 
potential  applications.  Compactly  supported  wavelets  are  particularly  interesting  because  of  their  natural 
ability  to  represent  data  with  intrinsically  local  properties.  They  are  useful  for  the  detection  of  edges  and 
singularities  in  image  and  sound  anal>sis,  and  for  data  compression.  However,  most  of  the  wavelet  based 
procedures  currently  available  do  not  explicitly  account  for  the  presence  of  noise  in  the  data.  A  discussion 
of  how  this  can  be  done  in  the  setting  of  some  simple  nonparametric  curve  estimation  problems  is  given. 
Wavelet  analogues  of  some  familiar  kernel  and  orthogonal  series  estimators  are  inmoduced  and  their  finite 
sample  and  asymptotic  properties  are  studied.  We  discover  that  there  is  a  fundamental  instability  in  the 
asvTOptotic  variance  of  wavelet  estimators  caused  by  the  lack  of  manslation  invariance  of  the  wavelet 
Ransform.  This  is  related  to  the  properties  of  certain  lacunarj'  sequences.  The  practical  consequences  of 
this  instability  are  assessed. 


MCS  1991  subject  classifications.  Primary;  62G07;  Secondary:  60G05,  62G203. 

Key  words  and  phrases.  Multiresolution  analysis,  nonparametric  regression,  hazard  rate,  kernel 
smoothing,  orthogonal  series,  delta  sequences. 

1.  Introduction 

Wavelet  theory  has  the  potential  to  provide  statisticians  with  powerful  new  techniques  for 
nonparametric  inference.  It  combines  recent  advances  in  approximation  theory  with  insights 
gained  from  applied  signal  analysis;  for  a  recent  survey  on  the  use  of  wavelets  in  signal 
processing,  see  Rioul  and  Vetterli  [RV91],  and  for  a  recent  discussion  connecting  wavelets  with 
problems  in  nonparametric  statistical  inference,  see  Wegman  [Weg91].  The  mathematical  side 
of  wavelet  theory  has  been  developed  by  Yves  Meyer  [Mey90]  and  his  coworkers  in  a  long 
series  of  papers,  see  e.g.  Mallat  [Mal89],  Daubechies  [Dau90];  for  a  concise  survey  see  Strang 
[Str89]. 

Consider  the  following  standard  nonparametric  regression  model  involving  an  unknown 
regression  function  r; 

y,  =  r(X,) +  €,,  i  =  \ . n. 

Two  versions  of  this  model  are  distinguished  in  the  literature; 

**  Partially  supported  by  US  ^rmy  Research  Office  Grant  DAAL03-90-G-0103  and  US  Air  Force  Office 
of  Scientific  Research  Grant  AFOSR91-0O48. 
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(i)  the  fixed  design  model  in  which  the  X/’s  are  nonrandom  design  points  (in  this  case  the 

Xi ’s  are  denoted  and  taken  to  be  ordered  0  <  /i  <  . . .  <  <  1),  with  the  observation 

errors  e,-  i.i.d.  with  mean  zero  and  variance  cr^; 

(ii)  the  random  design  model  in  which  the  {Xi,  KiO’s  are  independent  and  distributed  as 
{X,  Y),  with  r{x)  =  IE(y|X  =  x)  and  =  K,  -  r{Xi). 

In  each  case  the  problem  is  to  estimate  the  regression  function  r(f)  for  0  <  t  <  1. 
We  shall  introduce  wavelet  versions  of  the  most  frequently  used  kernel  and  orthogonal  series 
estimators  for  these  models,  as  well  as  for  the  problem  of  hazard  rate  estimation  in  survival 
analysis.  Our  estimators  are  delta  sequence  smoothers  based  on  wavelet  kernels  Em{-,  •)  as 
defined  in  Meyer  [Mey90].  These  kernels  represent  integral  operators  that  project  onto 
closed  subspaces  \4i  of  L}{R).  The  increasing  sequence  of  subspaces  Vm  form  a  so-called 
multiresolution  analysis  of  L}{R).  The  basic  idea  (to  be  discussed  at  greater  length  in  Section 
2)  is  that  the  !  „  provide  successive  approximations,  with  details  being  added  as  m  increases. 
Thus  m  acts  as  a  tuning  parameter,  much  as  the  bandwidth  does  for  standard  kernel  smoothers. 
A  key  aspect  of  wavelet  estimators  is  that  the  tuning  parameter  ranges  over  a  much  more  limited 
set  of  values  than  is  common  with  other  nonparametric  regression  techniques.  In  practice  only  a 
small  number  of  values  of  m  (say  three  or  four)  need  to  be  considered.  Despite  this  lack  of  control 
through  a  tuning  parameter,  w-hich  is  in  fact  an  advantage  when  it  comes  to  cross  validation,  we 
shall  see  that  wavelet  estimators  can  compete  effectively. 

For  the  fixed  design  model  we  propose  the  estimator; 

"  r 

r{t)  =  ^Yj  E„(t,s)ds, 

,=1  Ja, 

where  the  A,  are  intervals  that  partition  [0, 1]  with  u  €  A,.  This  is  a  wavelet  version  of  Gasser 
and  Muller’s  [GM79]  (convolution)  kernel  estimator  or  of  Hardle’s  ([Ha90],  p.  51)  orthogonal 
series  estimator.  For  the  random  design  model  we  propose 

n 

r(r)  =  ^-’^y,£,(r,X, )//(?), 


where  /  is  a  wavelet  estimator  of  the  density  of  X  given  by 

n 

/(0=:n-’^£^(AX.). 

1  =  1 

A  Standard  kernel  density  estimator  could  be  used  in  place  of  /.  The  estimator  r  is  a  wavelet  ver- ■  _ 

sion  of  the  (evaluation)  kernel  estimator  proposed  by  Nadarava  [Nad90]  and  Watson  [Wat64]. 

".nd/or 

It  can  also  be  viewed  as  a  wavelet  version  of  an  orthogonal  series  estimator  studied  by  lai 

tiiA 
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HardJe  [Ha84].  Antoniadis  and  Carmona  [AC90]  introduced  density  estimators  of  the  form  /. 
In  all  these  estimators  the  tuning  parameter  m  =  m{n)  needs  to  be  chosen  appropriate!,.  A 
recent  study  of  the  relative  merits  of  the  convolution  and  evaluation  kernel  approaches  to  non- 
parametric  regression  has  been  made  by  Chu  and  Matron  [CM91]. 

Like  wavelet  estimators,  orthogonal  series  estimators  employ  projections  onto  closed  sub¬ 
spaces  of  L^(R)  to  represent  successive  approximations.  However  the  projections  used  by  or¬ 
thogonal  series  estimators  are  finite  dimensional,  whereas  the  projections  used  by  the  wavelet 
estimators  are  infinite  dimensional.  Wavelet  estimators  cannot  be  seen  as  location-adaptive  ker¬ 
nel  estimators  either,  cf.  [BMP77].  In  fact  wavelet  estimators  are  properly  regaraed  as  delta 
sequence  estimators,  see  Walter  and  Blum  [WB79]:  r  is  a  special  type  of  the  delta  sequence 
estimator  studied  recently  by  Isogai  [Iso90];  r  is  a  special  case  of  the  estimator  considered  by 
Collomb  [Co81]  and  studied  recently  by  Doukhan  [Do90].  We  shall  obtain  consistency  of  r  and 
r,  for  r  by  applying  a  result  of  Isogai. 

We  are  also  able  to  establish  rate  of  convergence  results  for  f  and  asymptotic  normality  results 
for  suitably  modified  versions  of  r  and  r.  For  r  we  do  this  by  adapting  some  techniques  that  were 
originally  developed  for  kernel  estimators  by  Gasser  and  Muller  [GM79]. 

Eubank  and  Speckman  [ES91]  have  studied  rates  of  convergence  for  a  least  squares  orthog¬ 
onal  series  estimator  for  r.  They  used  trigonometric  series  and  their  method  of  proof  is  heavily 
dependent  on  the  special  properties  of  these  systems.  In  order  to  avoid  the  need  for  periodic 
boundary  conditions  on  the  derivatives  of  r  they  add  appropriate  polynomial  terms  to  the  or¬ 
thogonal  series.  By  using  a  least  squares  estimator  constructed  from  an  onhonormal  wavelet 
basis  of  L^([0,  1]),  we  show  that  the  rates  obtained  by  Eubank  and  Speckman  hold  without  the 
need  for  more  than  just  a  linear  correction  to  deal  with  the  boundary  behavior  of  r. 

Most  delta  sequence  estimators  in  statistics  have  a  wavelet  version  that  can  be  studied  using 
techniques  similar  to  those  developed  in  this  paper.  We  have  focused  our  attention  on  the 
fixed  design  wavelet  estimator  r.  The  paper  is  organized  as  follows.  Section  2  reviews  some 
background  on  wavelet  theory.  Wavelet  estimators  for  nonparametric  regression  are  discussed 
in  Section  3,  and  for  hazard  rates  in  Section  4.  Section  5  contains  a  discussion  of  applications  to 
real  data  and  a  comparison  of  kernel  and  wavelet  estimators.  Proofs  are  collected  in  Section  6. 

2.  Some  background  on  wavelets 

This  section  is  devoted  to  a  brief  introduction  to  the  theory  of  wavelets  that  will  be  used  in  the 
sequel.  We  limit  ourselves  to  the  basic  definitions  and  the  main  properties  of  wavelets.  For  more 
information,  including  proofs  of  the  theorems  in  full  generality  and  more  extensive  discussion 
and  examples,  see  Meyer  [Mey90],  Mallat  [Mal89],  Daubechies  [Dau90],  Chui  fCh92]. 

Computing  with  wavelets  requires  a  description  of  two  basic  functions,  the  scaling  function 
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(p{x)  and  the  primary  wavelet  \lr(x).  The  function  (p(x)  is  a  solution  of  a  two-scale  difference 
equation 

(p(x)  =  ^  Ck<p(2x  -  k)  (2.1) 

kiZZ 


with  normalization 


The  function  \//(x)  is  defined  by 


dx  =  1. 


+  (2.2) 

k€2Z 

The  coefficients  q  are  called  the  filter  coefficients,  and  it  is  from  careful  choice  of  these  that 
wavelet  functions  with  desirable  properties  can  be  constructed.  The  condition 

k 

ensures  the  existence  of  a  unique  L\lR)  solution  to  (2.1),  see  Daubechies  and  Lagarias 
([DL8 8a],  Theorem  2.1,  p.  8).  A  wavelet  system  is  the  infinite  collection  of  translated  and  scaled 
versions  of  (p  and  defined  by; 


<Pj,k(x)=y^^<p(yx-k),  j,keZ, 


^}f,,k{x)  =  V^~xlf{Vx-k),  j,k^'K. 
An  additional  condition  on  the  filter  coefficients, 


^  CkCk+U  — 
k 


2  if£  =  0, 

0  0, 


together  with  some  other  regularity  conditions,  imply  that  j,  k  e  Z)  is  an  orthonormal 
basis  of  L}{R),  and  [<Pj,k,  k  e  Z}  is  an  orthonormal  system  in  L}{IR)  for  each  j  e  Z;  see 
Daubechies  ([Dau90],  Lemma  3.4,  p.  958).  A  key  observation  of  Daubechies  ([Dau90],  Section 
4)  is  that  it  is  possible  to  construct  finite-length  sequences  of  filter  coefficients  satisfying  all  of 
these  conditions,  resulting  in  compactly  supported  tp  and  xfi. 

The  simplest  example  of  a  wavelet  system  is  the  Haar  system,  defined  by  setting  Co  =  C\  =  1, 
and  all  other  Ck  =  0.  In  this  case  both  the  scaling  function  and  the  primarx'  wavelet  are  supponed 
by  the  interval  (0,  1  ],  and  the  resulting  system  is  an  orthonormal  basis  of  L^(IR).  However,  if 
instead  of  a  general  function  in  L^(/R),  one  wants  to  analyze  a  function  with  much  less  or  much 
more  regularity,  the  expansion  given  by  the  Haar  system  is  inappropriate,  the  reason  being  that 
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the  coefficients  either  do  not  make  any  sense  or  their  decay  at  infinity  is  bad.  Replacing  the 
scaling  function  in  the  Haai  system  by  a  more  regular  function  produces  a  system  with  a  much 
better  behavior  with  respect  to  spaces  of  smooth  functions.  The  regularity  of  the  scaling  function 
(p  is  defined  in  the  following  sense: 


Definition  2.1.  A  scaling  function  tp  is  q-regular  (q  6  N)  if  for  any  £  <  q  and  for  any 
integer  k  one  has 


d^<p 

dx^ 


<  Cfc(l  +  Ixj) 


■^k 


where  Ck  is  a  generic  constant  depending  only  on  k. 


We  assume  throughout  that  tp  is  <7 -regular  for  some  q  e  N.  Of  course  the  primar)'  wavelet 
inherits  the  regularity  of  the  scaling  function.  Moreover  if  rp  is  regular  enough,  the  resulting 
wavelet  orthonormal  basis  provides  unconditional  bases  for  most  of  the  usual  function  spaces, 
see  Meyer  IMey90].  In  order  to  obtain  such  a  result,  Mallat  [Mal89]  introduced  the  notion  of  a 
multiresolution  analysis,  the  definition  of  which  we  recall  here: 


t^FFINITION  2.2.  A  multiresolution  analysis  of  1}{B)  consists  of  an  increasing  sequence  of 
closed  subspaces  Vj,  j  €  Z,  of  L^{R)  such  that 

(a)  nVj  =  {0},- 

(b)  UVj  =  L^(IR); 

(c)  there  exists  a  scaling  function  tp  €  Kq  such  that  {(p{-  —  k),k  e  Z)  is  an  orthonormal  basis 

of  Vo; 

and  for  all  h  G  L^(JR): 

(d)  for  all  keZ,h(-)  e  Ko  h(- -- k)  e  Ko, 

(e)  h(-)  G  Vj  h(2-)  G 

The  intuitive  meaning  of  (e)  is  that  in  passing  from  Vj  to  the  resolution  of  the  approximation 
is  doubled.  Mallat  [Mal89]  has  shown  that  given  any  multiresolution  analysis  it  is  possible  to 
derive  a  function  xp  (the  primary  wavelet)  such  that  the  family  {'pj^k .  ^  G  Z)  is  an  orthonormal 
basis  of  the  orthogonal  complement  Wj  of  V)  in  V,+i,  so  that  [pj,k ,  y,  /t  G  Z}  is  an  orthonormal 
basis  of  L^(IR).  Conversely,  the  compactly  supported  wavelet  systems  mentioned  earlier  give 
rise  to  multiresolution  analyses  of  L^{IR)\  see  Daubechies  ([Dau90],  Theorem  3.6).  When  the 
scaling  function  is  <7-regular,  the  corresponding  multiresolution  analysis  is  said  to  be  <7-regular. 
Let  us  now  introduce  the  following  projector  and  its  associated  integral  kernel: 

h  Ejih)  =  /  £^  (-,  y')h(y)dy  =  projection  of  h  onto  Vj. 
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It  is  easy  to  see  that  £y  (x,  y)  =  2^£o(2-'\r,  2''>’)  and  that  £o(a-  4-  y  +  =  £o(^,  y) 

for  /t  €  Z.  Obviously,  £o  is  not  a  convolution  kernel,  but  the  regularity  of  (p  and  implies 
that  it  is  bounded  above  by  a  convolution  kernel,  that  is  |£o(a:,>')l  K(x  —  y)  where 
K  is  some  positive,  bounded,  integrable  function  satisfying  moment  conditions,  see  Meyer 
([Mey90],  p.  33).  This  remark  will  be  exploited  in  the  following  sections.  In  particular,  the  bound 
sup^_^  \Ejix,  y)|  =  0{2^)  is  often  needed.  We  also  mention  some  other  useful  propenies.  For 
any  polynomial  p  of  degree  <  q  one  has 


Ejip)  =  p,  (2.3) 

see  Meyer  ([Mey90],  p.  38).  By  (2.3)  applied  to  p{x)  =  1  and  part  (c)  of  the  definition  of  a 
multiresolurion  analysis  we  see  that 

Y,<p{x-k)  =  \.  (2.4) 

ke2Z 

If  a  function  h  belongs  to  the  Sobolev  space  H"  =  H'’(R),  then  the  sequence  Ej(h)  converges 
strongly  to  h  in  //^'  for  |y|  <  ^  and 

llfi-£,(;011w=o(2-^'')  (2.5) 

for  0  <  y  <  <7,  by  Mallat  ([Mal89],  Theorem  3),  where  ||  •  lU.  denotes  the  norm  associated 
with  The  Sobolev  space  v  e  IR,  d  >  1,  is  defined  to  be  the  space  of 

tempered  distributions  whose  Fourier  transforms  are  square-integrable  with  respect  to  the 
measure  (1  +  |.xi‘)^  dx  on  1R‘^;  see  Hormander  ([Ho89],  p.  240). 

Compactly  supported  wavelets  are  paj  titioned  by  the  wavelet  number  N  into  families  whose 
scaling  functions  have  supports  of  equal  size.  N  is  defined  as  {Kmax  —  Emin  + 1)/2,  where 
is  the  greatest  even  integer,  and  Kmm  is  the  least  odd  integer,  such  that  q  7^  0  =>•  Kmin  S  k  < 
Emax-  Thus  N  is  generically  one  half  the  number  of  non-zero  filter  coefficients.  The  support  of 
(p  is  the  interval  [Emmr  Emax]^  and  the  support  of  ^  is  the  interval  [1  —  N,  A/];  note  that  both 
support  widths  are  IN  —  1  unit  intervals  long.  The  examples  constructed  by  Daubechies  have 
the  propeny  that  their  support  widths  increase  linearly  with  their  regularity.  This  is  illustrated 
by  Figure  2.1.  Daubechies  shows  that  there  exists  y  >  0  such  that  a’V'  €  where 
(p  €  ii  (p  e  C"  and  is  Holder  continuous  with  exponent  y  (0  >  y  <  1).  More 

precisely,  Daubechies  and  Lagarias  ([DL88b],  p.  62)  obtain 

^  r'O.SSOO  „  _  ^1.0878...  _  ^1.6179... 

2(p  ^  G  -itP  G  L  i<p  ^  L 

An  algorithm  described  in  Daubechies  and  Lagarias  ([DL88a],  p.  17)  (the  cascade  algorithm) 
allow.s  us  to  construct  the  orthogonal  compactly  supported  wavelets  as  limits  of  step  functions 
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Figure  2.1.  The  scaling  f unction  s'<p  (left-hand column)  and 
the  corresponding  wavelet  (right-hand column) for  N  =3,6 
and  8.  Note  that  the  support  widths  increase  with  the  regularity. 

which  are  finer  and  finer  scale  approximations  of  i^tp.  The  algorithm  is  easy  to  implement  on  a 
computer  and  converges  very  rapidly.  Given  a  finite  sequence  of  filter  coefficients,  Cq,  . . .  .Civ, 
define  the  linear  operator  A  by 

(^tl)n  —  ''f  ^  <  tZ  —  (tZk)k^ZZ 

k&TZ 


where  it  is  understood  that  Ct  =  0  if  <  0  or  ^  >  A’.  Define  —  A^a^,  where  (a°)o  =  1  and 
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(a^)k  =  0  for  7^  0.  Set 

AeZZ 

where  X  is  the  indicator  function  of  the  interval  [—5,  jl-  Under  certain  conditions  (see 
Daubechies  [Dau90],  p.  951),  the  sequence  of  functions  (pj  converges  pointwise  to  a  limit  func¬ 
tion  (p  that  satisfies  the  two- scale  difference  equation  (2.1). 

3.  Nonparametric  regression 

In  this  section  we  establish  consistency  of  r  using  a  theorem  of  Isogai  [Iso90].  Also,  under 
conditions  on  the  regression  function  r  that  are  weaker  than  the  usual  smoothness  assumptions, 
we  give  asymptotic  bounds  for  the  bias  and  variance  of  r  and  establish  asymptotic  normality 
for  a  modified  version  of  r.  This  modified  version  of  r  is  an  approximation  that  agrees  with 
r  at  dyadic  points  of  the  form  k2~'^;  it  is  needed  to  stabilize  the  variance.  At  the  non-dyadic 
points,  the  variance  of  r  itself  is  unstable  because  of  irregularity  in  the  wavelet  kernels.  In 
practice,  the  “optimal”  bandwidth  can  be  selected  by  cross  validation  (see  subsection  5.2  for 
further  discussion).  This  usually  amounts  to  a  choice  between  at  most  three  or  four  values  of 
m.  This  small  range  of  possibly  optimal  resolutions  is  very  desirable  since  the  computational 
demands  for  r  can  be  large. 

The  cascade  algorithm  described  in  Section  2  gives  a  simple  method  to  calculate  the  estimator 
r  and  r.  Note  that  the  delta  sequence  Em  used  in  r  and  r  can  be  written  as 

Emit, 

k€ZZ 

When  (p  has  compact  support  then  this  is  a  finite  sum,  each  term  of  which  can  be  evaluated 
by  the  cascade  algorithm.  To  evaluate  the  weights  Emit,  s)  ds  used  in  r(t)  we  employ  an 
integrated  version  of  (2.6): 

f  (pj(x)dx  =  '^al  f  x('2^x-k)dx. 

The  sequence  (pj  (x)  dx  converges  to  /J  (p(x)  dx  for  each  u  <  v. 

Some  plots  of  Emit,  s)  for  the  scaling  function  are  given  in  Figure  3.1.  Note  that  the 
wavelet  kernels  are  dyadic-translation  invariant:  Emit  +  n,  ■)  =  Emit,  •  —  u)  for  all  dyadic 
rationals  u  of  the  form  k/l'"  but  noi  for  general  real  numbers  u.  Also  note  the  substantial 
variation  in  the  form  of  the  wavelet  kernel  as  one  passes  between  the  dyadic  points.  Tnis  is 
more  than  just  a  variation  in  the  local  bandwidth — compare  the  curves  corresponding  to  f  =  .2 
and  r  =  .5  in  Figure  3.1(B).  It  appears  that  this  feature  of  the  wavelet  kernel  allows  wavelet 
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Figure  3.1.  The  wavelet  kernel  Emit,  s)  for  the  scaling  function  (,<p:  (A)  per¬ 
spective  plot  of  £2(1,  5);  (B)  Eiit,  ■)  for  ten  different  values  f.l,  .2 . l.Oj 

of  t.  Note  the  translation  invariance  Emit  A-  u,  ■)  =  Emit,  ■  —  u)  for  dyadic 
rationals  u  of  the  form  k/T" . 

estimatons  to  adapt  automatically  to  local  features  of  the  regression  function.  An  unfortunate 
side  effect  is  that  the  asymptotic  variance  of  wavelet  estimators  is  unstable. 

Another  reasonable  estimator  of  r  is 

fcit)  =  i"Y,  [  Emi0,s-t)ds, 

,  =  1 

which  is  a  convolution  kernel  estimator  based  on  the  kernel  Kit)  =  EoiO,  —t)  and  having 
bandwidth  2“'”.  A  similar  change  can  be  made  to  r.  Note  that  f  and  agree  at  dyadic  rationals 
of  the  form  k/l"' .  Asymptotic  results  for  this  estimator  are  special  cases  of  those  given  in  Gasser 
and  Muller  [GM79],  although  by  using  this  special  kernel  K  we  can  relax  the  smoothness 
conditions  on  r.  However,  a  finite  sample  comparison  between  r  and  that  examines  their 
integrated  mean  squared  errors  for  various  values  of  m  shows  that  r  is  superior,  see  subsection 
5.1.  This  is  explained  by  the  global  approximation  property  (2.5)  of  the  projection  operator 
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Em  used  in  r.  Such  a  property  is  not  available  for  A  general  bandwidth  might  improve 
the  performance  of  Kc,  which  only  uses  bandwidth  of  the  form  2~"'.  However,  the  heavy 
computational  demands  for  such  an  estimator  make  any  bandwidth  cross  validation  selection 
procedure  inpractical. 

Our  first  result  gives  consistency  of  r. 

Theorem  3.1.  If  r  is  continuous  at  i,m  oc  and  max,  ]/,  —  1  =  oi2~"'),  then  r{t)  is 

mean  square  consistent. 

Strong  consistency  of  f(r)  can  be  obtained  under  a  more  refined  condition  on  Eie  rate  of 
increase  of  m  using  Isogai’s  Theorem  3.2.  In  order  to  obtain  deeper  results  we  need  the  regression 
function  r  and  the  density  /  (in  the  random  design  case)  to  satisfy 

(1)  r,  /,  rf  €  H^\  for  some  u  > 

(2)  r  and  /  are  Lipschitz  of  order  y  >  0; 

(3)  f  does  not  vanish  on  ]0,  1[. 

Functions  belonging  to  for  u  >  |  are  continuously  differentiable  (see  Treves,  [Tre67], 
p.  331),  so  condition  (2)  is  redundant  when  u  >  |.  We  also  need  some  additional  assumptions 
on  the  scaling  function  tp: 

(4)  (p  has  compact  support; 

(5)  (p  is  Lipschitz: 

(6)  |<^(r) -1|  =  0(1^1)  as  ^--0. 

Here  cp  denotes  the  Fourier  rransfonn  of  (p.  The  compactly  supported  scaling  functions 
N  >  3,  satisfy  all  of  these  conditions.  In  particular,  (6)  holds  by  Daubechies  ([Dau90],  p.  963). 
For  our  asymptotic  normality  results  we  will  need  tp  to  be  regular  of  order  ^  >  1 .  However,  to 
obtain  good  rates  of  convergence  for  the  mean  square  error  of  r  we  need  to  adapt  the  regularity 
of  (p  to  the  smoothness  of  r: 

(7)  cp  is  regular  of  order  q  >v. 

A  disadvantage  of  more  regular  wavelets  is  that  their  support  is  larger  and  therefore  boundar>' 
effects  more  pronounced.  However,  wavelet  estimators  based  on  more  regular  compactly  sup¬ 
ported  wavelets  are  unbiased  away  from  the  boundary  for  higher  order  polynomials,  see  (2.3). 

As  in  Gasser  and  Miiller  [GM79]  for  the  fixed  design  case,  to  study  the  mean  square  error  of 
r  we  assume  that 

max  |r,  -  t,_i|  =  0(«“’).  (3.1) 

i 

We  shall  also  assume  that  for  some  Lipschitz  function  k{  ). 


,  I  '^'('■v,)| 

pin)  s  max  jj,  —  s,-\  — ^ — - 
I  i  n  > 


o(n  ’), 


(3.2) 
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where  A,  =  [5,-1 ,  Si).  This  is  a  standard  assumption  for  the  fixed  design  model,  but  somewhat 
weaker  than  the  “asymptotic  equidistance"  assumption  of  Gasser  and  Muller  [GM79]  in  which 
K(t)  =  1  and  p(n)  =  0(n~^)  for  some  i5  >  1. 

The  next  result  gives  an  asymptotic  bound  for  the  bias  of  r. 

Theorem  3.2. 

IEP(t)-r(t)  =  0(n-n  +  0(ri^) 

where 

(1/2") 

=  ■  ^/m/2'"  ifv  = 

.1/2'”  ifv>l 

In  order  to  obtain  an  asymptotic  expansion  of  the  variance  and  an  asymptotic  normality  result 
we  need  to  consider  an  approximation  to  r  based  on  its  values  at  dyadic  points  of  order  m.  That 
is,  define 

where  =  [2'”/]/2'”.  Thus  rj  is  the  piecewise  constant  approximation  to  r  at  resolution 
2~”’‘.  The  piecewise  constant  feature  of  r^  makes  it  an  unattractive  alternative  to  the  unmodified 
estimator  r  (at  least  for  small  m).  In  particular,  the  bias  is  increased  by  an  additional  term  of 
order  0(2"'”^).  How'ever,  if  one  tries  to  obtain  a  precise  asymptotic  expansion  of  the  variance 
of  r(t),  then  a  difficulty  arises  in  that  the  variance  is  unstable  as  a  function  of  r.  This  problem 
is  avoided  with  r^. 

Theorem  3.3. 


Var(r^(t))  = 


a^2'” 

n 


K(t)(iul+o(\))  +  0(2^'p(ny)  +  0 


where  Wq  =  J2kszz  variance  ofr(t)  has  this  form  except  that  for  general  (non- 

dyadic)  t  the  leading  term  is  0(2'^ /n). 

From  the  proof  of  this  theorem  it  can  be  seen  that  the  leading  term  of  the  variance  of  r(t)  is 
0^2"' n~^ K(t)w^ {tm),  where  t^  =  2"'t  —  [2"'t]  and  w"  is  the  function  defined  by 


w  (u) 


=  [ 
J IR 


v)  dv. 


Notice  that  for  dyadic  t  and  m  sufficiently  large,  tm  =  0,  so  the  variance  of  f(r)  is  asymptotically 
stable.  But  if  t  is  non-dyadic  then  the  sequence  t^  wanders  around  the  unit  interval  and  fails 
to  converge.  For  example,  at  r  —  1,  it  oscillates  between  2  (m  even)  and  =  (m  odd),  so 
the  variance  oscillates  between  w~(\)  and  ur(f).  The  sequence  belongs  to  the  class  of 
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exponential  lacunary  sequences  studied  in  ergodic  theory'.  It  is  known  that  except  for  at  most 
countably  many  the  sequence  tm  has  infinitely  many  accumulation  points  (see  Rauzy  [Ra76], 
p.  67,  Corollary  2.2).  It  is  also  interesting  to  note  that  for  irrational  r ’s,  the  sequence  is  eventually 
confined  to  the  interval  [|,  |],  see  ([Ra76],  p.  69). 

Plots  of  u;“  for  the  Daubechies  scaling  functions  (p  (p,  N  =  3,5,8,  are  displayed  in 
Figure  3.2.  It  can  be  seen  that  the  variance  of  r{t)  at  non-dyadic  t  can  var>'  approximately  by  a 
factor  of  3  for  N  =  3  and  by  a  factor  of  |  for  N  =  5  and  8.  The  variance  of  r  is  inflated  over 
the  variance  of  Pa  by  a  factor  of  at  most  1.75  for  =  5  and  1.19  for  A/  =8.  Taking  the  larger 
bias  of  Pa  into  account  it  appears  that  the  unmodified  estimator  f  is  at  least  as  efficient  as  Pa ,  and 
it  is  P  that  we  recommend  in  practice.  Generally,  higher  regularity  of  the  wavelet  basis  reduces 
instability  in  the  asymptotic  variance  of  P(t),  although  this  comes  at  the  expense  of  larger  bias 
(the  support  of  the  scaling  function  increases  with  the  regularity). 


0.0  0.2  0.4  0.6  0.8  1.0 


U 

Figure  3.2.  The  function  w"  for  jcp  (solid  line),  stp  (dotted  line)  and  stp 
(dashed  line). 

For  N  =  3,  5,  8,  the  constants  Wq  =  w^{Q)  are  1.81,  0.72,  and  1.05  respectively.  Thio 
suggests  that  that  is  more  suitable  than  sv’  or  when  used  in  connection  with  f^.  When 
used  in  connection  with  f  there  is  little  difference  between  -^tp  and 

Optima!  rates.  In  order  to  give  a  rate  of  convergence  for  the  mean  squared  error  of  their 
estimates,  Gasser  and  Millier  [GM79]  assume  that  r  is  /.'-times  continuously  differentiable  and 
use  a  kernel  of  order  k  >  2.  They  find  that  the  best  rate  of  convergence  for  the  mean  squared 
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error  is  analogous  result  holds  in  our  case.  Assume  that  r  is.  k  =  q  +  \  times 

continuously  differentiable,  where  q  is  the  regularity  of  the  scaling  function.  Since  polynomials 
of  degree  <  <7  are  invariant  under  s),  see  (2.3),  we  get  by  using  a  Taylor  expansion 
of  r  that  the  best  rate  of  convergence  for  the  mean  squared  error  of  r  at  dyadic  points  is 
the  same  as  for  the  kernel  estimator,  and  is  attained  by  m  =  \og2n/(2k  +  1).  It  is  worth 
stressing  that  the  wavelet  approach  allows  us  to  obtain  rates  under  much  weaker  assumptions 
on  r  than  second  order  differentiability.  For  example,  the  triangular  function  having  Fourier 
transform  sin^(^/2)/(^/2)^  belongs  to  //’  and  is  Lipschitz  of  order  1,  so  it  satisfies  our 
conditions  (1)  and  (2),  but  is  not  differentiable.  The  mean  squared  error  of  Pj  is  of  order 
Oi2"'/n)  +  Q +  0(2~^'"^).  TTe  best  rate  is  which  is  attained  by 

m  =  log2«/(2v*  +  1),  where  v*  =  min(|,  v,  y  +  —  €  and  6  =  0  for  i.’  e  >  0  for 


Our  next  result  concerns  asymptotic  normality  of  It  can  be  applied  to  the  unmodified 
estimator  r  at  dyadic  points. 

Theorem  3.4.  Ifn2~'^  00  and n2~^"'^''  ->  Q,ihen  ^n2~'^ir^(t)  —  r(t))  is  asymptotically 

normal  with  zero  mean  and  variance  o"voq  k(i). 

We  now  turn  to  the  estimator  r  used  in  the  random  design  model.  Much  of  the  above 
discu.'sion  carries  over  to  this  case.  The  following  result  gives  consistency  of  r. 

Theorem  3.5.  If  m  00  and  n2~'"  00,  then  f(t)  is  consistent  and,  if  in  addition 

IE{Y^\X  =  x)  is  bounded  for  x  belonging  to  a  neighborhood  oft,  then  r{i)  is  consistent. 

A  result  of  Doukhan  ([Do90],  Theorem  1)  dealing  with  general  delta  sequence  estimates  can 
be  used  to  establish  uniform  strong  consistency  of  r,  but  under  more  stringent  conditions  on  the 
rate  of  increase  of  m.  Conditions  (l)-(6)  of  Doukhan ’s  paper  are  easily  checked  along  the  lines 
that  we  check  Isogai’s  conditions  in  the  proof  of  Theorem  3.1  and  using  (6.2).  As  for  the  fixed 
design  model,  in  order  to  obtain  an  asymptotic  distribution  result  (at  all  t),  we  need  to  consider 
the  piecewise  constant  approximation  Pdit)  =  r{t^"'^)  instead  of  r. 

Theorem  3.6.  Suppose  that  for  some  e  >  0  we  have  IE{\Y\'^'^^\X  =  x)  bounded  for  x 
belonging  to  a  neighborhood  of  t ,  n2~’^  00  andn2~^""'''  ->  0.  Then,  y/n2~"'{rd{t)  —  r{t)) 
is  asymptotically  normal  with  zero  mean  and  variance  Var{Y\X  =  t)ivl/f  (t). 

Symmetrized  wavelet  estimators.  Inspecting  Figure  3.2(B)  it  can  be  seen  that  there  is  a 
lack  of  symmetT)'  in  the  wavelet  kernels  £^(r,5)  about  the  point  t,  as  inherited  from  the 
asymmetry  in  the  scaling  functions,  see  Figure  2.1.  This  is  somewhat  unnatural  from  a  statistical 
point  of  view  since  a  time-reversal  in  the  data  produces  a  different  estimate  from  the  time- 
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reversed  r  (denoted  rrev)-  Unfortunately,  except  for  the  Haar  basis,  there  exists  no  compactly 
supponed  wavelet  basis  in  which  the  scaling  function  is  symmetric  around  any  axis,  see 
Daubechies  [Dau90].  Another  difficulty  is  caused  by  the  excessive  weight  placed  at  points  far 
to  the  left  of  t,  resulting  in  a  pronounced  edge  effect  at  the  lower  limit  of  the  design  interval 
(see  the  discussion  concerning  the  voltage  data  example  in  subsection  5.3).  A  simple  way  of 
correcting  these  flaws  in  r  is  to  use  a  weighted  average  of  r  and  rrev  with  weights  depending  on 
the  evaluation  point: 

rsymit)  =tr{l)  +  (1  -  t)rrev{t)- 

It  is  easily  seen  that  this  estimator  inherits  the  propenies  of  r  proved  above.  A  similar  modifi¬ 
cation  can  be  made  to  any  cf  the  wavelet  estimators  considered  in  this  paper. 


Confidence  intervals.  In  order  to  use  our  asymptotic  normality  result  to  obtain  confidence 
intervals  for  r(f)  at  a  gi'-en  /,  one  needs  to  consistently  estimate  the  noise  variance.  In  the  fixed 
design  case  the  noise  variance  is  <t‘.  We  suggest  using  the  following  estimate  of  Muller  [Mu85]: 


'I-l 


0-'  = 


I 


obtained  by  fitting  constants  to  successive  triples  of  the  data.  Lemma  2  of  Muller  shows  that  if 
the  regression  function  is  Hdlder  continuous  of  order  1,  then  a~  is  almost  surely  consistent  and 


V  W  2  ' 


\ 


a.s.  as  «  — >  oo  for  any  e  >  0.  In  practice,  to  obtain  a  good  impression  of  the  errors  involved 
in  the  point  estimates  r(t)  of  r(t),  it  would  be  enough  to  provide  confidence  intervals  at  the  2^” 
dyadic  points  of  the  design  region.  For  m  =  4  this  would  give  16  confidence  intervals. 


Least  squares  wavelet  regression.  Orthogonal  series  used  for  least  squares  regression 
should  form  a  basis  of  the  L^-space  on  the  design  region,  i.c.  L^([0,  1  ]).  The  wavelets  described 
up  to  now  form  an  orthonormal  basis  of  L}{IR)  and  are  not  appropriate.  Instead,  we  shall  use 
a  wavelet  orthonormal  basis  {lA;.*-  7  ^  1.  ^  €  5y)  of  L^([0,  1])  constructed  by  Jaffard  and 
Meyer  [JM89].  Here  Sj  is  a  subset  of  Z,  defined  as  Rj  in  Jaffard  and  Meyer  (  [JM89],  p.  95). 
For  some  integer  Jq  depending  on  q,  the  set  Sj  is  empty  for  j  <  Jq.  These  wavelets  belong 
to  the  space  where  q  >  2  and  the  subscript  0  indicates  support  within  ]0. 1[.  They  are 

defined  through  a  multiresolution  analysis  of  L^([0,  1])  and  form  unconditional  bases  of  Hq, 
0  <  V  <  2q  —  2.  Assume  that  r(0)  =  r(l)  =  0  and  r  €  Hq.  This  is  a  weaker  assumption 
than  condition  (ii)  of  Theorem  1  of  Eubank  and  Speckman  [ES91],  but  the  boundary'  condition 
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r(0)  =  r(l)  =  0  is  still  rather  restrictive.  It  can  be  removed  by  adding  a  linear  function  to  the 
regression  analysis,  cf.  Eubank  and  Speckman  [ES91]. 

We  shall  obtain  a  rate  of  convergence  for  the  mean  squared  error 

n 

R(Pis)  =  n-^Yl  E(r(X,)  -  ns(X,))\ 

1=1 

of  the  least  squares  wavelet  estimator  f/j  given  by 

m 

risO)  = 

j  =  \  k€Sj 

where  the  dj^s  are  obtained  by  least  squares.  The  number  Dm  =  YlJ=jo  !  functions  \l/j,k 
used  in  the  regression  is  bounded  above  by  |2'”.  We  assume  that  the  observation  errors  have 
constant  variance  Let  denote  the  empirical  distribution  function  of  the  design  points  X, 
and  assume  that  =  sup,  \G„(t)  —  G(t)\  0,  where  G  is  some  distribution  function  that 

is  absolutely  continuous  with  density  bounded  away  from  zero  and  infinity.  Typically  S„  is  of 
order  0(n~^)  in  the  fixed  design  case,  and  of  order  log  logn)  in  the  random  design 

case,  see  Eubank  and  Speckman  [ES91]  for  further  discussion. 

Theore.M  3.7.  If  r  €  Hq  where  v  >  1  is  an  integer,  then 
R(hs)  <  Oi2-^n  +  <r^Dmln  + 

This  rate  of  convergence  essentially  agrees  with  the  rate  given  in  Theorem  1  (iii)  of  Eubank 
and  Speckman  [ES91]. 


4.  Hazard  rate  estimation 

In  this  section  we  study  a  wavelet  version  of  Ramlau-Hansen’s  [RH83]  estimator  of  a  hazard 
rate  function.  It  turns  out  that  most  of  the  wavelet  techniques  we  have  used  for  nonparametric 
regression  carry  over  to  this  setting.  Since  the  work  of  Aalen  [OAA],  it  is  well  known  that  hazard 
rate  estimation  can  be  viewed  in  the  context  of  inference  for  a  counting  process  multiplicative 
intensity  model  given  byA(r)  =  a(/)y(f),  where  Y  (t)  is  a  nonnegative  observed  process.  In  the 
usual  survival  analysis  or  reliability  application,  a  portion  T  =  min(7,  C)  of  an  individual’s 
lifetime  T  is  observed,  where  C  is  a  censoring  time  (assumed  to  be  independent  of  T).  Data 
is  available  on  n  individuals  with  corresponding  (T,,  C,)  being  independent  and  distributed  as 
(T,  C).  Suppose  that  T  has  hazard  rate  function  a  and  that  the  distribution  function  H  of  T  is 
such  that //(I)  <  l.Then  the  counting  process  A'„(0  =  <t,C,<  T,]  has  intensity 
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Q;(r)y„(f),  where  >/)  is  the  number  of  individuals  at  risk  at  time 

TTiis  is  a  special  case  of  Aalen’s  multiplicative  intensity  model.  In  what  follows,  the  notation  is 
essentially  the  same  as  in  Ramlau-Hansen. 

Our  wavelet  estimator  for  the  hazard  function  a  is  defined  by 

a(r)=  f  E„(t,s)dm,  (4.1) 

Jo 

where  is  the  Nelson-Aalen  estimator 

k‘)  =  f 

Jo  y(s) 

J(s)  =  /(/(j')  >  0}  and  J(s)/  Y (s)  is  defined  to  be  0  when  / (s)  =  0.  To  obtain  asymptotic 
results  we  index  the  processes  J  and  Y  by  n.  We  use  the  same  assumptions  on  a  as  were 
used  for  r  in  the  regression  case.  Also  assume  that  there  exists  a  positive  function  t  bounded 
away  from  zero  and  infinity  such  that  IE[supo^j^j  |/27„(5)/y„(j)  -  l/r(5)|]  ->  Oas/t  -->■  oc. 
This  condition  is  easily  checked  in  the  survival  analysis  case  described  above. 

Define  S„  =  supo<j.<i  IE{1  —  J„(s)).  Our  first  result  implies  that  the  wavelet  estimate  is 
asymptotically  unbiased. 

Theorem  4.1. 

/Ea(:)-a(0  =  0(r)^.)  +  2'”^^0(Sy^) 

where  t]^  is  defined  in  the  statement  of  Theorem  3.2. 

As  in  the  regression  case  it  is  convenient  to  approximate  a  by  an  estimator  based  on  the  values 
of  or  at  dyadic  points:  a(i(r)  =  where  =  [2"’r]/2"’.  Observe  that  Theorem  4.1  holds 

for  provided  that  we  add  0(2~'”^)  to  the  asymptotic  expansion  of  the  bias. 

Theorem  4.2. 

2'"  aCt) 

/E(&A0  -  a(0f  = - +  2"'o(n-^)  +  0{ril)  +  2'"0(5„)  +  0(2-^>'). 

n  r(t) 

The  mean  squared  error  of  a  (t)  has  the  same  form  except  that  for  general  (non-dyadic)  t  the 
leading  term  is  0{2"'  j n). 

Under  n2"'”  co,  n2''“"’^’  — >  0  and  we  have  L"-consistency  of  ccdU').  The 

leading  term  in  the  asymptotic  expansion  of  the  mean  squared  error  is  then  of  order  0(2'"  Jn). 
If  a  is  used  instead  of  ctd  then  the  Lipschitz  condition  on  a  is  unnecessar>'. 
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TjiEORE.M  4.3. //«2  oc,  nl  0  and  S„  =  o(n  ’),  then  Vn2~^(a^  (t)  —  af)) 

is  asymptotically  normal  with  zero  mean  and  variance  a(t)ijul/t(t). 


5.  Practical  application  and  discussion 

5.1  Finite  sample  comparisons.  So  far  we  have  only  considered  the  asymptotic  be¬ 
havior  of  our  estimates.  However,  as  long  as  one  deals  with  linear  estimates  and  is  interested  in 
the  mean  squared  error  or  the  integrated  mean  squared  error  of  these  estimates  for  finite  sam¬ 
ples,  numerical  calculations  are  possible  that  approximate  these  quantities  to  any  desired  degree 
of  accuracy  when  the  true  regression  function,  the  error-variance  and  the  weights  are  known 
(other  properties  of  the  error  probability  law  being  irrelevant).  The  method  that  we  are  going  to 
describe  has  been  used  by  Gasser  and  Muller  [GM84]  for  a  finite  sample  comparison  between 
cubic  smoothing  splines  and  various  types  of  kernel  estimates. 

The  method  applies  to  linear  estimates  of  the  form  r(t)  =  Wi(t)Y,.  For  such  estimates 
the  bias  at  t  is  ^’i(t)(r(t,)--r(t))  and  the  variance  is  cr“  The  integrated  mean 

squared  error  is  obtained  by  numerically  integrating  variance  -F  bias^  over  a  fine  grid  of  t's. 

We  used  the  same  underlying  function  as  in  Gasser  and  Muller,  that  is 

r(G  =  2-2rT-3exp(-(r-0.5)V0.0]),  r  €  [0,  1] 

and  compared  our  wavelet  estimator  with  a  kernel  estimate.  The  residual  variance  was  taken 
as  <7‘  =  0.2  and  the  sample  size  n  =  25.  The  results  are  presented  in  Table  1.  The  integrated 
mean  squared  error  was  evaluated  using  a  grid  of  200  equidistant  points  between  0.25  and  0.75. 
We  restricted  attention  to  an  interval  smaller  than  [0,  1]  to  avoid  possible  boundary  effects. 
The  wavelet  kernels  corresponding  to  four  different  scaling  functions  for  =  3,5,6  and 
8)  were  used.  TTiey  were  compared  with  an  Epanechnikov  kernel  having  optimal  bandwidth. 
Although  the  results  are  not  reported  hero,  we  also  examined  the  performance  of  the  wavelet 
convolution  estimator  TTie  integrated  mean  squared  error  was  significantly  larger,  mainly 
due  to  a  larger  variance. 

TTie  convolution  kernel  estimate  does  slightly  better  than  the  wavelet  estimate,  but  this  is  not 
unexpected  since  the  optimal  bandwidth  is  chosen  from  a  continuum  of  possible  values,  whereas 
the  tuning  parameter  m  is  discrete. 

5.2  Cross  validation.  Any  nonparametric  regression  method  is  highly  dependent  on  the 
tuning  parameter,  so  it  is  desirable  to  select  such  parameters  automatically.  The  problem  of 
selecting  m  is  rather  easier  than  the  bandwidth  selection  problem  for  kernel  estimators  (see,  e.g. 
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Optimal  MSE 

Optimal  m 

Integrated  bias^ 

Integrated  Variance 

3(p 

2.64x10“^ 

j 

1.07x10-' 

1.56x10-' 

S<P 

3.09x10-' 

4 

3.54x10-^ 

3.05x10-' 

6(P 

3.08x10-' 

4 

2.04x10-“ 

3.06x10-' 

stp 

3.07x10-' 

3 

1.51x10-^ 

1.56x10-' 

Table  1  The  performance  of  the  wavelet  estimator  r  for  various  scaling 
functions  tp  when  g~  =  0.2.  For  comparison,  the  integrated  mean  square  error 
of  the  convolution  kernel  (Epanechnikov)  estimate  with  optimal  bandwidth 
0.065,  2.35  x  10-^ 

H^dle  and  Marron  [HM85]  in  the  regression  case  and  Gregoire  [Gr91]  in  the  survival  analysis 
case),  since  the  bandwidth  is  essentially  reduced  to  being  of  the  form  2“'”  where  m  <  ^  log^  n. 
A  commonly-used  selection  rule  adapted  to  our  setting  is  to  choose  m  as  the  minimizer  of  the 
cross  validation  function 

n 

CV(m)  =  «■'  Y1(Y,  -  r(,)(r,))T 
1=1 

where  f(,)  (t)  is  the  leave-one-out  estimator  obtained  by  evaluating  r  (as  a  function  of  m  and  t) 
with  the  /th  data  point  removed.  This  gives  reasonable  results  when  applied  to  real  and  simulated 
data.  In  practice,  for  sample  sizes  between  100  and  200,  we  have  found  that  it  suffices  to  examine 
only  m  =  3,  4  and  5. 

5.3  Examples.  To  illustrate  the  techniques  given  so  far,  and  to  add  to  the  earlier  discussion, 
we  now  consider  two  real  examples. 

The  first  example  concerns  the  motor-cycle  impact  data  given  in  Handle  [Ha90]  and  presented 
in  Figure  5.1.  The  observations  consist  of  accelerometer  readings  taken  through  time  in  an 
experiment  on  the  efficacy  of  crash  helmets.  This  particular  data  set  was  also  analyzed  by 
Silverman  [Sil80]  by  spline  smoothing  techniques.  For  several  reasons  the  time  points  are 
not  regularly  spaced.  It  is  of  interest  both  to  discern  the  general  shape  of  the  underlying 
acceleration  curve  and  to  draw  inferences  about  its  minimum  and  maximum  values.  Obviously, 
the  observations  are  correlated  and  their  variance  is  not  constant,  but  for  illustrative  purposes 
we  shall  assume  that  the  fixed  design  model  holds. 

We  plotted  the  estimate  r  for  various  values  of  m,  using  the  wavelet  kernel  based  on  ntf'.  This 
choice  of  scaling  function  is  reasonable  according  to  the  discussion  following  the  statement  of 
3.3.  We  tried  ^(p,  which  the  finite  sample  comparisons  suggested  as  being  even  better  than  fi<p,  but 
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Figure  5.1.  Plot  of  the  motorcycle  impact  data  together  with  the  wavelet  regression 
estimates  r  based  on  the  scaling  function  for  m  =  3  (dotted  line),  m  =  4  (solid  line) 
and  m  =  5  (dashed  line).  Cross  validation  selected  the  curve  m  =  4  as  giving  the  best  fit. 


obtained  a  very  poor  fit.  This  poor  performance  of  ■stp  is  probably  due  to  the  greater  instability 
of  the  variance,  see  Figure  3.2.  Cross  validation  selected  the  curve  /n  =  4  as  giving  the  best  fit: 
the  function  C  V  (m)  was  found  to  be  534  at  m  =  3, 432  atm  =4  and  497  at  m  =  5.  Inspecting 
Figure  5.1  one  notices  r  is  considerably  biased  for  m  =  3;  for  m  =  5  h  detects  the  sharp  drop 
in  acceleration  around  15  milliseconds,  but  has  undesirable  oscillations.  The  m  =  4  estimate  is 
cicarly  the  best — it  captures  the  general  features  of  the  underlying  curve,  except  for  a  positive 
bias  around  12  milliseconds. 

Another  example  is  presented  in  Figure  5.2.  The  data  set  is  discussed  in  Example  3.4.5 
of  Eubank  (  [Eu88],  p.  82)  and  represents  the  voltage  drop  in  the  battery  of  a  guided  missile 
motor  during  flight.  In  this  example  the  assumptions  of  the  fixed  design  model  are  much  more 
reasonable.  We  find  that  there  is  an  undesirable  boundary  effect  in  r  at  lime  0.  The  time  reversed 
r  has  a  similar  problem  at  the  right-hand  end  of  the  design  interval.  However,  the  symmetrized 
estin.ator  rsym  discussed  in  Section  3  produces  an  acceptable  fit.  In  fact,  considering  that  r^y^ 
uses  a  tuning  parameter  setting  chosen  from  among  only  three  different  values  (m  =  3,4  and 
5),  it  gives  an  outstanding  result  compared  with  other  nonparametric  regression  estimates. 

6.  Proofs 

Proof  of  Theorem  3.1.  We  apply  Theorem  3.1  of  Isogai  [Iso90]  with  2'”  in  the  role  of  m 
and  Em  in  place  of  3^.  We  need  to  check  that  following  conditions  hold  for  each  x  6  [0, 1]: 
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t(m«  (seco'VSs) 

FiGU  RE  5.2.  Plot  of  the  voltage  drop  data  together  with  the  symmetrized  wavelet  regression 
estimate  hsym  (solid  line),  r  (dotted  line)  andr^ev  (dashed  line)  form  =  4  and  scaling  function 
6^.  Note  that  symmetrization  has  improved  the  estimate  at  the  boundaries. 

(i)  sup^>]  /o  lE^(x,  y)\dy  <  oo; 

(ii)  Jq  Em{x,  y)dy  1; 

(iii)  Jq  \E„,(x,y)\!(\x  -  >’|  >  €)dy  OforalU  >  0; 

(iv)  sup^g[o,i]  >')i  =  0(2'^). 

Using  the  assumption  that  (p  is  O-regular  we  have 

[  \Err,(x,y)\dy<C22'"  f  (\  +  2"'\x  -  yW"  dy ,  (6.D 

Jo  Jo 

so  (i)  holds,  (ii)  follows  by  setting  /  =  1  in  equation  (33)  of  Mallat  [Mal89].  Using  the  indicator 
to  control  the  integrand  in  (6.1),  we  see  that  the  expression  in  (iii)  is  of  order  0(2~"')  0. 

Condition  (iv)  is  immediate  from  the  properties  of  Em  discussed  in  Section  2.  □ 

Proof  OF  Theorem  3.2.  Arguing  along  the  lines  of  Gasser  and  Muller  ([GM79],  Appendix 
1),  and  using  the  Lipschitz  condition  (2)  on  r,  it  can  be  seen  that 

IEr(r)=  f  Em(t,s)ris)ds  +  0(n-n- 
Jo 
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To  complete  the  proof  it  suffices  to  show  that 

f  E^(t,s)r(s)ds  =  r(t)  +  0(j]„).  (6.2) 

Jo 

TTiis  is  demonstrated  by  applying  an  extension  of  a  result  of  Schomburg  [Sch90]  to  the  function 
y)  =  Eq(x,  v);  see  Theorem  A.l  in  the  Appendix.  In  Lemma  A. 2  we  check  that  this 
function  satisfies  the  conditions  of  Theorem  A.l.  First  note  that 

f  E,„(t,s)r(s)ds  =  (E,„r)(t) 

Jo 

for  m  sufficiently  large,  since  t  is  in  the  interior  of  [0,  1]  and  (p  has  compact  support.  Next, 
denoting  the  delta  distribution  centered  at  t  by  and  the  duality  between  and  H~'’  by  (•,  •) 
(see  [Tre67],  1967,  p.  331),  one  has 

\r(i)-(Er„r)(t)\  =  \{r,8,)-{Er„r,  ^,)1 

=  l{r,S,  -  E^S,)\  (6.3) 

Here  we  have  used  the  fact  that  E^  can  be  defined  on  H~^  and  is  a  projection  operator;  see 
Meyer  ([Mey90],  p.  43).  Applying  Theorem  A.l  with  2'^'  in  the  role  of  n  now  gives  the  result. 


Proof  of  Theore.m  3.3.  As  in  Gasser  and  Muller  ([GM79],  Appendix  2), 

Var (r (:)) - /  E;;,(t,  s)k(s)  ds 

'  ^  Jo 

f  ^rn(ES)ds\  -  -  f  El,(t,  s)k{s)  ds 
,  =  1  ^  Ja,  '  ft  Jo 

”  I  1 

1  =  1  ^ 

(where  «,•  and  u,  belong  to  A,  ) 

-  Vi)k(v,)  -  E^O,  Ui)K(s,)^  . 

From  (3.2)  the  number  of  terms  contributing  to  the  above  sum  is  of  order  0(n2~"').  Hence, 
using  (3.1),  the  bound  sup,  ^  E  2^,  and  the  Lipschitz  propeny  of  k  (which  implies 

K(vi)  =  /rCv,)  +  0(]/n)),  the  last  displayed  quantity  is  bounded  by 

0(-)0(n2-"')(p(n)2^  +  -2^-  +  -2^  sup  l£o(2'"r.  2'”i’,)  -  £o(2"’r,  2'"w,)|). 

^  n  n  n  I  / 
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Using  the  compact  support  and  Lipschiiz  properties  of  (p  one  can  show  that  Eo{i,  ■)  is  Lipschitz 
(uniformly  in  r),  so  that 

/2'"\ 

supl£o(2'"r,2'”v,)-  £o(2'”i.2'”n,)l  =  o(  — ). 

I  ^  n  y 


Simplifying,  we  obtain 


|var(f(f))-—  f  El{!,s)K(s)ds  =  0(2^ p(n))  +  o(^-). 
I  n  Jo  \  n-  J 


The  proof  is  completed  by  appealing  to  the  following  lemma. 
Lem.m.a.  6.1.  (a)Ifh  :  R  R  is  continuous  at  t ,  then 


lim  2-"=  /  El(t^’”\s)h(s)ds  =h(t)wl 


(h)  If  h  :  R  R  is  bounded  in  a  neighbourhood  oft,  then 


f  £;(u 

Jm 


s)h{s)ds  =  0(2"’). 


Proof.  Since  2"’/^"’'^  =  [2"'/]  and  £o(a-  A-k.y  +  k)=:  £o(x,  y)  for  all  6  Z, 

2  -"’/'  El(t^'^\s)h(s)ds  =  2'^  [  El{2"'t^'"\2'^s)his)ds 

Jm  Jm 

=  2'”  [  £j(0,  2"’^  -  [2'^t])h(s)  ds 

Jm 

=  [  EliO,u)h(t^’^^ +u2-"')du 

Jm 

hit)  f  E^(0,u)du 

Jm 

as  /7J  ->  oo.  Here  we  have  used  the  continuity  of  h  ax  t  and  the  compact  support  assumption 
for  (p,  which  implies  that  £o(0,  •)  has  compact  suppon.  This  assumption  and  the  fact  that 
[(p{-  —  k)  :  k  €  "ZL}  is  an  orthonormal  system  in  L^(R)  give 


j  Eo(v,  u)du  =  y~]  (p^(v  -  k) 
Jm  fTf? 


so  that  Eq(0,  u)  du  =  w^,  completing  the  proof  of  (a).  The  proof  of  (b)  is  similar.  □ 
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Proof  of  Tiieorem  3.4.  The  Lipschitz  condition  on  r  gives 

so  by  Theorem  3.2  we  have  -Jnl^OErdit)  ~  r{t))  0.  Write  f^(r)  -  Erj(>)  in  the  form 

Xr,"=i  where  w,  =  iu,„  =  s)  ds.  We  shall  appeal  to  a  central  limit  theorem 

for  weighted  sums  of  i.i.d.  random  variables  (see  Eicker  [Ei63])  to  obtain 

JUn(0.  1). 

To  complete  the  proof  we  need  to  check  the  Lindeberg  type  condition 

max  |u’,  j‘/Var(fj(t))  0 

1  <1  <n 

and  show  that 

Var(r<f(r))  ~  2'”a^wlK(t)/n. 

From  Theorem  3.3  and  p(n)  =  o(l /n)  we  have 

n2“'”Var(r^(0)  =  a^wlK'it)  +0(1)  +  0(np(n))  +  C>(^)  a-u'5Ar(>). 

Also  using  maxi<,<„  |iu,  |^  =  0(2~'^ln'^),  we  have 


max  |u;,  |VVar(f^(r))  = 

l<i<n 


0(2'”//?) 

n2-'”Var(r(r)) 


0, 


so  the  Lindeberg  condition  holds.  □ 


Proof  of  Theorem 


Var(/(r))  =  Var(£^(r,  X))/n  is  bounded 

OC 

Elit,x)f{x)dx  ^  /n)  0, 

OC 


by 


by  Lemma  6.1  (b).  The  bias  of  f(i)  is  (£^/)(r)  -  f  (t)  which  tends  to  zero  by  the  same 
argument  that  was  applied  to  r  at  the  end  of  the  proof  of  Theorem  3.2.  Thus  /  is  pointwise 
consistent.  Denotes  =rf  ^ndgit)  =  X,)L/h,  soihat  r  =  g//.  It  can  be  shown, 

along  the  lines  Var(/ (i))  was  handled  above,  except  using  the  conditional  variance  formula,  that 
Var(s(0)  =  Wtix{E,„(t.X)Y)ln  =  0(2'”//?).  Finally,  the  bias  of  s(r)  is  {E„g)(t)—g(f)  ->  0, 
and  we  conclude  that  g(t)  is  pointwise  consistent.  □ 
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Proof  of  Theorem  3.6.  Replacing  t  by  in  me  proof  of  consistency  of /(r)  and  using 
continuity  of  /  at  t,  we  have  that  fdit)  consistently  estimates  fit).  Thus,  by 

i-d  -  r  =  igd  -  rfd)/fd, 

we  can  reduce  to  considering  Jnl~'^igd  —  r  fd)it)  which  can  be  expressed  as 

r~[~  " 

(6.4) 

'  j=i 

where  2„,  =  Em(t^'”\  Xi)(Yi  -  r(t)).  But 

EZ„,  =  -  r(r)(£„/)(r^“)) 

=  (£.g)(r^'”'0  -  -  [5(0  -  5(0"=>)] 

-  r(t){(ErrJ)(t^'^'>)  -  /(0'"0  -  [fit)  -  /(O'”))]} 

so  the  last  term  in  (6.4)  is  of  order  ^n2~"'iOiT]m)  +  0(2~'"^)),  where  r]m  is  given  in  Theo¬ 
rem  3.2,  and  we  have  used  the  Lipschitz  conditions  on  r  and  /  (which  imply  that  g  is  Lipschitz 
of  the  same  order)  to  bound  the  terms  inside  the  square  brackets.  It  follows  that  EZ„i  ->  0  by 
0.  To  complete  the  proof  we  shall  apply  the  Lindeberg-Feller  Theorem  to  the  first 
term  in  (6.4).  First  note  that 

2-'"Var(E(Z„,|A',))  =  2-'”  [  Elif'^'f  x)irix)  -  rit))^^  fix)  dx  - 

JlR 

which  tends  to  zero  by  Lemma  6.1.  Next, 

2-'”E(Var(Z„,lA/))  =  2-'”  [  Elit^'”\  x)W?s{Y\X  =  x)f{x)  dx 

JlR 

Var(riX  =  t)fit)xvl 

again  by  Lemma  6.1.  Thus,  by  the  conditional  variance  formula 

Var(Z„0  =  E(Var(Z„,iXi))  +  Var(E(Z„riX,)), 

we  see  that  the  variance  of  the  first  term  in  (6.4)  tends  to  Var(y  |  A  =  t)f  iOw^.  It  remains  to 
check  the  Lindeberg  condition,  which  amounts  to  showing  that 

E(£2/(|(/„1  >  S^))  0  for  all  6  >  0, 

where  (/„  =  (Z„i  —  EZ„i)/VVar(Z„i)-  Suppose  that  E(y‘^|A  =  x)  is  bounded  in  a 
neighborhood  of  /;  the  general  case  of  a  bounded  conditional  moment  of  order  2  -f  €  is  similar. 
Then,  by  the  Cauchy-Schwarz  and  Chebyshev'  inequalities, 

E((/2/(|(/j  >  s^))  <  [E(/„"J5(a75‘)-L 
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Using  the  compact  support  propeny  of  (p, 

=  0(2-^) 0(2^"’)  [  I(\t -x\<C2-'^)[lE(Y^\X  =x)  +  C]f(x)d.x 
JlR 

=  0(2'”), 

where  C  is  a  generic  positive  constant.  Thus  >  -^v^))  =  0(j2Fjn)  0,  as 

required.  □ 

Proof  of  Theorem  3.7.  The  reader  should  have  a  copy  of  Eubank  and  Speckman  [ES91] 
on  hand  before  attempting  this  proof.  Using  the  inequality  (Jaffard  and  Meyer,  [jM89],  p.  104) 

<  C^^'-y'-txpi-C.yix  -  k2-J\).  X  e  !R  k  €  Sj  and  t<2q  -2. 

where  C\  and  C2  are  generic  constants  that  are  independent  of  k,  the  conclusion  of  Lemma  2  of 
Eubank  and  Speckman  [ES91]  becomes 

Ik'  -  {Trr.gryw  <  il."'  -  ;  ii  +  (Cj /v/^)2'” C3 11  r  -  (7„r)|l. 

The  theorem  now  folic  .v.s  as  in  dUbani.  and  Spcckir.a’''  ;F£21]  by  applying  the  inequality 

OC 

<  00 

y  =  l  k€.Sj 


for  r  €  Hq  corem  2  of  Jaffard  and  Meyer).  □ 

Proof  of  Tkeore.m  4.1.  From  (4.1)  we  get  the  following  expansion 

/•i  f’  j  (5) 

a(t)=  Err,(t,s)a(s)Jr,(s)ds+  E„(i,s)^^dM„(s).  (6.5) 

Jo  Jo  Yr,(s) 

Since  the  last  integral  is  a  zero-mean  martingale  evaluated  at  1,  we  have 


lEa'(V)  =  E  /  Emit,  s)a(s)J„is)ds 


^  E 


r 


Emit,s)ccis)iJ„is)  -  \)ds 


+ 


/  Er„(t,s)a 

Jo 


(6.6) 


(s)  ds. 


The  last  term  in  (6.6)  is  the  same  as  (6.2)  with  r  replaced  by  a.  Using  the  Cauchy-Schwarz 
inequality  and  Lemma  6.1  the  first  term  in  (6.6)  is  s?  en  to  be  of  order  2^"  ).  □ 
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Proof  of  Theorem  4.2.  First  note  thata^(r)  —  a(r)  can  be  written  as 

r  1 


/O 


E„,{l^'^\s)a{s){Jr,{s)  - \)ds-V 
■  1 


f 

JQ 


^  s)a{s)  ds  —  a;(r''"') 


(m) 


+  (a(r^'"^)-a(r))+  f 

Jo  >«(^) 


(6.7) 


Along  the  lines  of  the  previous  proof  we  see  that  the  first  term  in  (6.7)  is  of  order  2'^'  O  (8r.)  and 
the  second  term  is  of  order  0(rjn).  The  third  term  is  of  order  0(2~"''*')  since  a  is  Lipschitz 
of  order  y.  The  second  moment  of  the  stochastic  integral  is  n~'^  s)a(s)r„(s)  ds, 

where  rn(5’)  =  nlE\^Jn{s) /Y„  (j')].  It  follows  along  the  lines  of  the  proof  of  6.1,  using  ar^ln  in 
place  of  h,  and  by  our  assumptions  on  (r^)  and  r,  that 


E 


r  /•! 


LJO 


Em(t^"'\s)^dMJs) 
Yn  (s) 


n  r(0 


Wq  +  2  ’o(n  ). 


The  second  part  of  the  theorem  is  proved  in  a  similar  fashion,  except  using  pan  (b)  of  Lemma 
6.1.  □ 


Proof  of  Theorem  4.0.  By  our  assumptions,  'Jn2~"'[ccd{f)  —  a(r))  is  asymptotically 
equivalent  to  s/n2~'^  /q  Em(t^"’\  s)Jn(s)fY„(s)  dMnis),  which  is  the  value  at  1  of  a  martin¬ 
gale  having  quadratic  variation  asymptotically  equivalent  to  2~'"  £‘ s)a{s)/r{s)  ds 

at  I.  The  previous  proof  gives  that  the  latter  quantity  tends  to  a{t)wQ/x{i).  The  result  follows 
using  Rebolledo’s  martingale  central  limit  theorem,  cf.  Ramlau-Hansen  [RH83].  □ 

Appendix 

An  extension  of  Schomburg’s  Theorem  and  its  wavelet  application 

Schomburg’s  [Sch90]  original  result  gives  the  rate  of  convergence  of  certain  sequences  of 
type  8  to  the  delta  distribution  centered  at  the  origin  in  .  We  need  to  extend  this  result  to 
deal  with  approximations  for  the  delta  distribution  centered  at  r  e  R'^ .  As  in  Schomburg  we 
allow  d  >\,  although  we  really  only  need  d  =  1.  The  sign  of  v  is  reversed  from  Condition  (1). 
Let  y  <  ~d/2,g(-,t)  €  for  all  r  and  define  the  sequence  (g„(-,t))„>]  C  H'’iR‘^) 

for  each  t,  by 

{gni-,  t),(p}  =  {g(-,nO,(p(^-^)  for<l>  G  SiR^^'). 

whtTt  S(R‘^)  is  the  space  of  rapidly  decreasing  test  functions;  see  Hormander  ([Ho89],  p.  160). 
For  a  classical  function  g  one  has  g„(s,  t)  =  g(ns,  ni).  The  Fourier  transform  of  a  function 
h  G  LHR"^)  is  defined  by  fi(^)  =  jjj^4e~'^^h{x)dx,^  G  In  this  appendix  fi  denotes  the 
Fourier  transform  rather  than  an  estimator  of  h. 
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Theorem  A.l.  Suppose  that 


sup||g(-,OL  <  oo 


(AAA) 


and  for  some  y  >  0  there  exists  a  neighbourhood  U  ofO  in  IR‘^  such  that 


(A.2.2) 


belongs  to  L'^(U)  for  each  t  €  .  Then 


if-v  <  y  +  ", 


Ilg/jC'A)  -  =  j  0(n  ^ y/\ogn)  //-y  =  y  +  j, 


0(n-y) 


if-v>y  +  \ 


as  n  oo. 

Proof.  Clearly  one  may  take  U  as  the  unit  ball  in  1R‘^ .  Noting  that  g„(^,  t)  =  gi^Jn.  nt), 
we  have 

||g„(.,r)-<5jr‘=  f  (1 

JrR‘‘ 

=  [  0  +  \^l-rig(S/n.nO-e-‘f-'l^d^ 


=  n  /  (\  +  n~\x\^y\g„{x,nt)  -  e' 

JlR^ 


Now  split  the  integration  into 


(1  +  n^|r|‘)''|g„(r,  nt)  -  e' 


\x\^'’\g(x,  nt)\^dx  +  2n‘ 


/ 


xrdx 


=  0(n^'''^‘^) 


by  (A.  1.1),  and 


f  (1+n^P 

yir|<I 


x\^)'’\g„(r,nt)-e~"'^  ‘fdx  <  Cn‘^  /  (1  + Jy 

,/|r|<l 


by  (A.2.2)  with  ^  =  x  and  t  set  to  nt.  TTe  remainder  of  the  proof  is  routine  integration,  see 
Schomburg  [Sch90]  for  details.  □ 

Lemma  A. 2.  The  function  g(x,y)  =  Eo(x,y)  satisfies  Conditions  (A.i.l)  and  (A. 1.2)  of 
Theorem  A.l . 


Eois.  0  =  ^ 


Proof.  Noting  that 
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one  has 

Eo(^.t)=  [  Eo(s,i)e-‘^^ds 

J/R 

=  (fO  -  k)  f  (p(s  -  k)e~‘^^  ds 

ke2Z 

k€2Z 

(p([  +k)e‘''^. 

k^ZL 

Changing  k  to  k  —  [r]  and  setting  i<  =  r  —  [r]  we  have 

y~'  (pit  +  k)e'''^  =  (p(u  +  k)e'^^ . 

keTZ.  ke2Z 

Since  cp  has  compact  support,  (A. 1.1)  holds  if  cp  e  H'’  for  v  <  0.  But  (p  €  L"(1R)  C  //'  for 
y  <  0,  as  required.  Next,  by  (2.4)  and  again  since  (p  has  compact  support,  we  have 

V  <p(u  +  k)e‘’'-^  =  1  +  V  +  k)(e‘^^  -  1)  =  1  +  0(1^1) 

*eZZ  ksZZ 

as  ^  ^  0.  Thus,  by  Condition  (6), 

£o(^  r)  =  (1  +  0(|^l))e-'^'e'^“(l  +  0(1^|))  =  +  0(1^1)) 

so  (A.  1.2)  holds.  □ 
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